Overview of the TREC 2001 Question Answering Track
نویسنده
چکیده
The TREC 2004 Question Answering track contained a single task in which question series were used to define a set of targets. Each series contained factoid and list questions and related to a single target. The final question in the series was an “Other” question that asked for additional information about the target that was not covered by previous questions in the series. Each question type was evaluated separately with the final score a weighted average of the different component scores. Applying the combined measure on a per-series basis produces a QA task evaluation that more closely mimics classic document retrieval evaluation. The goal of the TREC question answering (QA) track is to foster research on systems that return answers themselves, rather than documents containing answers, in response to a question. The track started in TREC-8 (1999), with the first several editions of the track focused on factoid questions. A factoid question is a fact-based, short answer question such as How many calories are there in a Big Mac?. The task in the TREC 2003 QA track was a combined task that contained list and definition questions in addition to factoid questions [3]. A list question asks for different instances of a particular kind of information to be returned, such as List the names of chewing gums. Answering such questions requires a system to assemble an answer from information located in multiple documents. A definition question asks for interesting information about a particular person or thing such as Who is Vlad the Impaler? or What is a golden parachute?. Definition questions also require systems to locate information in multiple documents, but in this case the information of interest is much less crisply delineated. The TREC 2003 track was the first large-scale evaluation of list and definition questions, and the results of the track demonstrated that not only are list and definition questions challenging tasks for systems, but they present evaluation challenges as well. Definition task scores contained a relatively large error term in comparison to the size of the difference between scores of different systems. For example, the analysis of the TREC 2003 definition evaluation performed as part of TREC 2003 showed that an absolute difference in scores of 0.1 was needed to have 95% confidence that the comparison represented a true difference in scores when the test set contained 50 questions. Yet relatively few of the runs submitted to TREC 2003 differed by this amount. Reducing the error term requires more definition questions in the test set. The task for the TREC 2004 QA track was designed to accommodate more definition questions while keeping a mix of different question types. The TREC 2004 test set contained factoid and list questions grouped into different series, where each series had the target of a definition associated with it. Each question in a series asked for some information about the target. In addition, the final question in each series was an explicit “other” question, which was to be interpreted as “Tell me other interesting things about this target I don’t know enough to ask directly”. This last question is roughly equivalent to the definition questions in the TREC 2003 task. The reorganization of the combined task into question series has an important additional benefit. Each series is a (limited) abstraction of an information dialog in which the user is trying to define the target. The target and earlier questions in a series provide the context for the current question. Context processing is an important element for question answering systems to possess, but its use has not yet been successfully incorporated into the TREC QA track [2]. The remainder of this paper describes the TREC 2004 QA track in more detail. The next section describes the question series that formed the basis of the evaluation. The following section describes the way the individual question types were evaluated and gives the scores for the runs for that component. Section 3 summarizes the technical 3 Hale Bopp comet 3.1 FACTOID When was the comet discovered? 3.2 FACTOID How often does it approach the earth? 3.3 LIST In what countries was the comet visible on its last return? 3.4 OTHER 21 Club Med 21.1 FACTOID How many Club Med vacation spots are there worldwide? 21.2 LIST List the spots in the United States. 21.3 FACTOID Where is an adults-only Club Med? 21.4 OTHER 22 Franz Kafka 22.1 FACTOID Where was Franz Kafka born? 22.2 FACTOID When was he born? 22.3 FACTOID What is his ethnic background? 22.4 LIST What books did he author? 22.5 OTHER Figure 1: Sample question series from the test set. Series 3 has a THING as a target, series 21 has an ORGANIZATION as a target, and series 22 has a PERSON as a target. approaches used by the systems to answer the questions. Section 4 looks at the advantages of evaluating runs using a per-series combined score rather than an overall combined score. The final section looks at the future of the track.
منابع مشابه
ECNU at TREC 2015: LiveQA Track
This paper reports on East Normal China University’s participation in the TREC 2015 LiveQA track. An overview is presented to introduce our community question answer system and discuss the technologies. This year, the Trec LiveQA track expands the traditional QA track, focusing on “live” question answering for the real-user questions. At this challenge, we built a real-time community question a...
متن کاملThe TREC-8 Question Answering Track Report
The TREC-8 Question Answering track was the rst large-scale evaluation of domain-independent question answering systems. This paper summarizes the results of the track by giving a brief overview of the di erent approaches taken to solve the problem. The most accurate systems found a correct response for more than 2/3 of the questions. Relatively simple bag-of-words approaches were adequate for ...
متن کاملQuestion Answering: CNLP at the TREC-10 Question Answering Track
This paper describes the retrieval experiments for the main task and list task of the TREC-10 questionanswering track. The question answering system described automatically finds answers to questions in a large document collection. The system uses a two-stage retrieval approach to answer finding based on matching of named entities, linguistic patterns, and keywords. In answering a question, the...
متن کاملNTT Question Answering System in TREC 2001
In this report, we describe our question-answering system SAIQA-e (System for Advanced Interactive Question Answering in English) which ran the main task of TREC-10’s QA-track. Our system has two characteristics (1) named entity recognition based on support vector machines and (2) heuristic apposition detection. The MPR score of the main task is 0.228 and experimental results indicate the effec...
متن کاملTequesta: The University of Amsterdam's Textual Question Answering System
We describe our participation in the TREC-10 Question Answering track. All our runs used the Tequesta system; we provide a detailed account of the natural language processing and inferencing techniques that are part of Tequesta. We also summarize and discuss our results, which concern both the main task and the list task.
متن کامل